Add Loquacious hugginface to bliss jobs #626

JackTemaki · 2025-09-29T08:00:55Z

Basic jobs to use Loquacious on our standard pipelines. So far only contains jobs for dev/test as well as small and medium train sets. The large corpus needs extra handling.

The jobs require an existing huggingface cache directory.

Basic jobs to use Loquacious on our standard pipelines. So far only contains jobs for dev/test as well as small and medium train sets. The large corpus needs extra handling. The jobs require an existing huggingface cache directory. Co-authored-by: Nick Rossenbach <[email protected]> Co-authored-by: Robin Schmitt <[email protected]>

albertz · 2025-10-03T11:32:07Z

datasets/loquacious.py

+            "-c:a",
+            "libvorbis",
+            "-b:a",
+            "16k",


I was checking some other examples.

What I also see:
["ffmpeg", "-y", "-f", "s16le", "-ar", "%i" % sr, "-i", "pipe:0", "-c:a", "libvorbis", "-q", "3.0", path]
(That's what you use for your TTS Ogg export.)
["ffmpeg", "-hide_banner", "-loglevel", "error", "-y", "-threads", "1", "-f", "s16le", "-ar", "%i" % sr, "-i", "pipe:0", "-c:a", "libvorbis", "-q", "3.0", path]

Or in i6_experiments.common.datasets.librispeech.corpus.get_bliss_corpus_dict and i6_experiments.common.datasets.tedlium2.corpus.get_bliss_corpus_dict (and many others), we use "output_format": "ogg", "codec": "libvorbis" and sample_rate=16000 for BlissChangeEncodingJob. I wonder a bit about that: Here we don't specify the quality at all (neither -b:a nor -q), as far as I can see?

I have not seen any other example using -b:a. This corresponds to the fixed_bitrate option in BlissChangeEncodingJob.

Using a fixed bitrate (ABR) (option -b) seems suboptimal to me. A variable bitrate (VBR) (option -q) makes more sense?

But I just see that -q 3 is already the default. And you said that the FFmpeg defaults are suboptimal? Maybe we should use -q 4 or higher?

Oh, I just learned: If you don't pass -c:a libvorbis, and you have some weird stripped down FFMpeg build which was build without libvorbis, then FFmpeg still provides a builtin vorbis encoder, so it can still generate ogg files, but the quality will just be (much?) lower.

In some older setups, I did not use -c:a libvorbis, but simply ffmpeg -i ... out...ogg. But I think in most of my environments, I always had my custom Linuxbrew ffmpeg installed, which should have libvorbis enabled.

But now, at the RWTH HPC cluster, the FFmpeg from there (which is also only available after module load FFmpeg), that one does not support libvorbis.

JackTemaki and others added 2 commits September 29, 2025 09:57

ruff

94c4a58

albertz reviewed Oct 3, 2025

View reviewed changes

albertz mentioned this pull request Oct 4, 2025

TransformAndMapHuggingFaceDatasetJob and ExtractTextFromHuggingFaceDatasetJob #627

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Loquacious hugginface to bliss jobs #626

Add Loquacious hugginface to bliss jobs #626

Uh oh!

JackTemaki commented Sep 29, 2025

Uh oh!

albertz Oct 3, 2025

Uh oh!

albertz Oct 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add Loquacious hugginface to bliss jobs #626

Are you sure you want to change the base?

Add Loquacious hugginface to bliss jobs #626

Uh oh!

Conversation

JackTemaki commented Sep 29, 2025

Uh oh!

albertz Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

albertz Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants